Chapter 14 HIERARCHICAL TEXT CLASSIFICATION METHODS AND THEIR SPECIFICATION
نویسندگان
چکیده
Hierarchical text classification refers to assigning text documents to the categories in a given category tree based on their content. With large number of categories organized as a tree, hierarchical text classification helps users to find information more quickly and accurately. Nevertheless, hierarchical text classification methods in the past have often been constructed in a proprietary manner. The construction steps often involve human efforts and are not completely automated. In this chapter, we therefore propose a specification language known as HCL (Hierarchical Classification Language). HCL is designed to describe a hierarchical classification method including the definition of a category tree and training of classifiers associated with the categories. Using HCL, a hierarchical classification method can be materialized easily with the help of a method generator system.
منابع مشابه
Automated compound classification using a chemical ontology
UNLABELLED BACKGROUND Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of bio...
متن کاملChapter 8 PROBABILISTIC MODELS FOR TEXT MINING
A number of probabilistic methods such as LDA, hidden Markov models, Markov random fields have arisen in recent years for probabilistic analysis of text data. This chapter provides an overview of a variety of probabilistic models for text mining. The chapter focuses more on the fundamental probabilistic techniques, and also covers their various applications to different text mining problems. So...
متن کاملAn evaluation of text classification methods for literary study
This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naı̈ve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson’s poems and the sentimentalism classification of chapters in early American novels. The algorithms were a...
متن کاملUtilizing global and path information with language modelling for hierarchical text classification
Hierarchical text classification of a Web taxonomy is challenging because it is a very large-scale problem with hundreds of thousand categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. The narrow-down approach is the state-of-the-art that utilizes a search engine for generating candidates from the taxonomy and build...
متن کاملDevelopment of Compound Clustering Techniques Using Hybrid Soft-Computing Algorithms
Databases of molecular structures available to the pharmaceutical industry comprise millions of molecules. With the advent of combinatorial chemistry, a vast number of compounds can be available either physically or virtually, which can make screening all of them infeasible in terms of time and cost. Therefore, only a subset of the entire database that encompasses the full range of structural t...
متن کامل